Food Access & Nutrition Equity in Texas
Excel Power Query - Project Workflow and Report
1 Summary
This project aims to develop a data wrangling pipeline that integrates three datasets related to food access, nutritional quality, and socioeconomic disparities in Texas. Using data from the USDA Food Access Research Atlas, CORGIS County Demographics dataset, and CORGIS Food Nutrition dataset. The pipeline is implemented across four technical environments:
- Python with Pandas
- R with tidyverse
- SQL
- Excel
The resulting clean dataset enables exploration of how food accessibility in Texas intersects with nutritional availability and demographic factors such as poverty or race/ethnicity. This work emphasizes the technical process of data wrangling and reproducible pipeline development, providing a foundation for future research into food security and health equity disparities across Texas communities.
2 Data Sources
2.1 USDA Food Access Research Atlas
Source: USDA Data Products
[72535rows x 147 columns]
Provides census-tract-level indicators of supermarket accessibility and food access challenges for different demographic groups. Includes population, housing, income, race, SNAP benefits, and geographic accessibility measures (e.g., low-income populations living >1 mile from a grocery store).
Wrangling Issues:
- Very wide (147 columns) and long (72,000+ rows) dataset requiring subsetting to 10–15 meaningful columns
- Needs standardization of county and state names for merging.
- Requires treatment of missing or 0 placeholder values
- Census tract-level data must be aggregated to county level to match other datasets
2.2 U.S. County Demographics(From 2010s)
Source: CORGIS Dataset Project, County Demographics
[3140rows x 43 columns]
County-level data from 2010–2019 across the U.S., including age distribution, education, employment, ethnicity, household income, housing characteristics, and health-related statistics like travel time and veteran status.
Wrangling Issues:
- Filtering to only Texas counties from national dataset.
- Missing values encoded as -1 need identification and handling.
- Long dot-separated column names (e.g., Ethnicities.White Alone) require renaming and flattening for usability.
- County and state name standardization needed to match with USDA Food Access Atlas
2.3 USDA Food Composition
Source: CORGIS Dataset Project, Food
[7084rows x 38columns]
Contains nutritional breakdowns of thousands of foods, with fields for macronutrients (protein, fat, carbohydrates), vitamins (A, C, B12, etc.), and minerals (calcium, iron, magnesium). Each row represents a distinct food item.
Wrangling Issues:
- Contains 60+ nutrient columns requiring reduction to 5–10 most relevant variables.
- Food names contain formatting synonyms (e.g., “Milk, human” vs “Human milk”) requiring text standardization.
- Grouping by food type (e.g., “Dairy”, “Meat”, “Vegetables”) for analysis.
- Measurements use different units (grams, mg, mcg) that must be documented for proper interpretation.
3 Selected Columns
3.1 USDA Food Access Research Atlas (10 columns)
We retain these 11 columns from the original 147 to capture essential food access metrics while eliminating redundancy. The selected variables focus on the standard 1-mile threshold for urban areas and 10-mile threshold for rural areas, as this represents the USDA’s primary food desert definition. We keep only the two largest racial/ethnic minority groups in Texas (Black and Hispanic populations) to enable disparity analysis without excessive granularity. Geographic identifiers are essential for merging datasets, while population totals serve as denominators for calculating meaningful access percentages at both tract and county levels.
| Column Name | Data Type | Description | Example | Notes |
|---|---|---|---|---|
| State | String | State name | “Texas” | Use for filtering; standardize to “TX” for merging |
| County | String | County name | “Harris County” | May need suffix standardization, merge key with Demographics |
| Urban | Integer (Binary) | Flag indicating if tract is urban (1) or rural (0) | 0, 1 | Based on Census Bureau urban area definitions; use for urban/rural analysis |
| PovertyRate | Float | Percentage of tract population living at or below federal poverty threshold | 0.0 - 100.0 (typically 5-40) | Decimal format (e.g., 15.3 = 15.3%) |
| Pop2010 | Integer | Total population count from 2010 Census | 1,500 - 8,000 (typical tract) | Denominator for all percentage calculations; validate against Demographics dataset |
| TractLOWI | Integer | Total count of low-income population in tract | 0 - 5,000 | Denominator for low-income disparity calculations; low-income defined as ≤200% of poverty line |
| lapop1 | Integer | Population count beyond 1 mile from supermarket | 0 - 6,000 | Use to calculate PercentLowAccess = (lapop1/Pop2010)*100; primary food access indicator |
| lalowi1 | Integer | Low income population count beyond 1 mile from supermarket | 0 - 4,000 | Use to calculate PercentLowIncomeLowAccess = (lalowi1/TractLOWI)*100; measures vulnerable population access |
| lablack1 | Integer | Black/African American population count with low access | 0 - 3,000 | Numerator for Black disparity ratio; compare to county-level PercentBlack |
| lahisp1 | Integer | Hispanic/Latino population count with low access | 0 - 4,000 | Numerator for Hispanic disparity ratio; compare to county-level PercentHispanic |
3.2 U.S. County Demographics (10 columns)
We select these 10 columns from the original 43 to provide socioeconomic context for food access patterns without overwhelming the analysis. The focus is on variables directly relevant to our research questions: income and education as economic indicators, age structure to identify vulnerable populations, and detailed racial/ethnic composition to enable disparity calculations. We exclude employment, housing, and business ownership variables as they are less directly related to food access outcomes. The 2010 population figures align temporally with the USDA food access data for valid comparisons.
| Column Name | Data Type | Description | Example | Notes |
|---|---|---|---|---|
| County | String | County name | “Harris County” | Merge key with USDA, need to add/remove “County” suffix for consistency |
| State | String | State abbreviation or name | “TX” or “Texas” | Standardize to match USDA format (“TX” recommended) |
| Population.2010 Population | Integer | County population from 2010 Census | 825 - 4,000,000 | Rename to: Population2010 |
| Population.Population per Square Mile | Float | Population density | 1.5 - 3,000+ | Rename to: PopDensity |
| Income.Median Houseold Income | Integer | Median household income (2015-2019 ACS) | $30,000 - $100,000+ | Rename to: MedianIncome |
| Education.Bachelor’s Degree or Higher | Float | Percentage of adults 25+ with bachelor’s degree or higher (2015-2019 ACS) | 8.0 - 60.0 | Rename to: BachelorsDegreeRate |
| Age.Percent Under 18 Years | Float | Percentage of population under age 18 | 15.0 - 35.0 | Rename to: PercentUnder18 |
| Ethnicities.Black Alone | Float | Percentage of population identifying as Black/African American alone | 0.5 - 50.0 | Rename to: PercentBlack |
| Ethnicities.Hispanic or Latino | Float | Percentage of population identifying as Hispanic/Latino (any race) | 5.0 - 95.0 | Rename to: PercentHispanic, key disparity metric |
| Ethnicities.White Alone | Float | Percentage of population identifying as White alone | 10.0 - 90.0 | Rename to: PercentWhite |
3.3 USDA Food Composition (9 columns)
We retain these 9 columns from the original 38 to create a focused nutritional profile without excessive micronutrient detail. The selection emphasizes macronutrients that define food quality—protein for satiety, fiber as a health indicator, and sugar as a marker of processed foods. We keep only two micronutrients (Vitamin A and calcium) as they are most commonly deficient in food desert populations and represent broader nutritional adequacy. This streamlined approach enables clear categorization of “nutrient-dense” versus “empty calorie” foods while avoiding the analytical complexity of tracking dozens of vitamins and minerals.
| Column Name | Data Type | Description | Example | Notes |
|---|---|---|---|---|
| Category | String | General food category assigned by USDA | “Milk”, “Beef Product” | Use for grouping in nutrition |
| Description | String | Full description of food item | “Milk, whole, 3.25% milkfat” | May contain formatting inconsistencies |
| Data.Protein | Float | Protein content | 0.0 - 90.0(g) | Rename to: Protein; high values in meat, fish, legumes |
| Data.Fiber | Float | Dietary fiber content | 0.0 - 40.0(g) | Rename to: Fiber; quality indicator |
| Data.Sugar Total | Float | Total sugar content | 0.0 - 100.0(g) | Rename to: SugarTotal; quality indicator; high = worse (candies, sodas) |
| Data.Fat.Total Lipid | Float | Total fat content | 0.0 - 100.0(g) | Rename to: TotalFat; high in oils, nuts, fatty meats |
| Data.Vitamins.Vitamin A - RAE | Integer | Vitamin A content as Retinol Activity Equivalents | 0 - 20,000+(mcg) | Rename to: VitaminA; high in orange vegetables, dairy, liver |
| Data.Major Minerals.Calcium | Integer | Calcium content | 0 - 2,000+(mg) | Rename to: Calcium; high in dairy, leafy greens, fortified foods |
4 Data Loading
We create a new Excel workbook and import the three raw CSV files into Power Query.
Process: Data tab → Get Data → From File → From Text/CSV → Import → click Transform Data
Challenge: Because our dataset are massive, here we only create connection to these three raw dataset.
5 Clean and Subset the USDA Food Access Research Atlas (Tract Level)
5.1 Keep only the 10 required columns `
We keep only 10 necessary columns for the following wrangling process(`Select)
Ambiguous Columns:
- PovertyRate : Percentage of tract population living at or below federal poverty threshold
- lapop1 : Population count beyond 1 mile from supermarket
- lalowi1 : Low income population count beyond 1 mile from supermarket
- lablack1 : Black/African American population count with low access
- lahisp1 : Hispanic/Latino population count with low access
- TractLOWI : Total count of low-income population in tract
5.2 Handling Missing Value
Process: Choose Remove Rows > Remove Blank Rows
5.3 Filter to Texas only
Process: Click the dropdown arrow on State → Text Filters → Equals… → type Texas(Filter)
5.4 Rename columns for clarity
Goal: Simplify column names for readability.
Process: Double-click each header
| Old Column Name | New Column Name |
|---|---|
| State | State |
| County | County |
| Urban | Urban |
| Pop2010 | Pop2010 |
| PovertyRate | PovertyRate |
| TractLOWI | Tract_LowIncome |
| lapop1 | LowAccess_Pop |
| lalowi1 | LowIncome_LowAccess |
| lablack1 | Black_LowAccess |
| lahisp1 | Hispanic_LowAccess |
5.4.1 Set correct data types
We noticed 4 “LowAccess” columns data types is Text, which might caused later wrangling issues, so we change its into Whole Number
- Pop2010, TractLowIncome, all four “LowAccess” columns → Whole Number
- PovertyRate → Decimal Number
5.5 Handle data types and save as query: TX_FoodAccess_Tracts
Process:In the left pane, right-click the query name → Rename → TX_FoodAccess_Tracts
Notes
- We rename this Power Query Table as
TX_FoodAccess_Tracts(Left-Top Panel) - The Applied Step(Right Panel) section document our cleaning process.
6 Clean and Subset the County Demographics Dataset
6.1 Keep only the 10 required demographic columns Select
Process: We keep only 10 necessary columns for the following wrangling process(Select)
Ambiguous Columns
Income.Median Houseold Income : This includes the income of the householder and all other individuals 15 years old and over in the household whether they are related to the householder or not. Because many households consist of only one person average household income is usually less than average family income.
Population.Population per Square Mile : Population density
6.2 Replace -1 values with null (missing value handling)
Goal: County Demographics - Check for -1 to NULL, then Remove
Process: Select all columns → Replace Values → Value To Find: -1 → Replace With: (leave blank)
Then we choose Remove Rows and Remove Blank Rows
6.3 Filter to Texas (State = “TX”) >
Process: Click the dropdown arrow on State → Text Filters → Equals… → type TX (filter,replace)
6.3.1 Standardizing Geographic Names(align with USDA state = “Texas”)
Goal: Ensure “TX” and “Texas” match across datasets.
Process: In this dataset, right-click the State column > Replace Values > Find: TX, Replace With: Texas.
6.4 Rename columns using clean names
Goal: Simplify column names for readability.
Process: Double-click each header
| Old long name | New short name |
|---|---|
| County | County |
| State | State |
| Population.2010 Population | Population2010 |
| Population.Population per Square Mile | PopDensity |
| Income.Median Houseold Income | MedianIncome |
| Education.Bachelor’s Degree or Higher | BachelorsRate |
| Age.Percent Under 18 Years | Pct_Under18 |
| Ethnicities.Black Alone | Pct_Black |
| Ethnicities.Hispanic or Latino | Pct_Hispanic |
| Ethnicities.White Alone | Pct_White |
## Correct data types and save as query: TX_County_Demographics
6.4.1 Set correct data types
Here we set Population2010 → Whole Number, everything else → Decimal Number or Percentage as appropriate
Process:In the left pane, right-click the query name → Rename → TX_County_Demographics
7 Clean and Subset the USDA Food Nutrition Dataset
7.1 Keep only the 9 essential nutrient columns Select
Process: We keep only 9 necessary columns for the following wrangling process(→ Right-click → Remove Other Columns)
7.2 Remove rows with missing Category
Process: Choose Remove Rows > Remove Blank Rows
7.3 Rename columns for consistency
Goal: Simplify column names for readability.
Process: Double-click each header.
Since all columns in this raw dataset columns are messy, we need to standardize all the columns.
| Old Column Name | New Column Name |
|---|---|
| Category | Category |
| Description | FoodName |
| Data.Protein | Protein |
| Data.Carbohydrate | Carbohydrate |
| Data.Fiber | Fiber |
| Data.Sugar Total | Sugar |
| Data.Fat.Total Lipid | TotalFat |
| Data.Vitamins.Vitamin A - RAE | VitaminA_RAE |
| Data.Major Minerals.Calcium | Calcium |
7.4 Pivot/Unpivot
Goal: Unpivot (Wide to Long)
Process:
- Select the identifier column > catgory
- Go to the Transform tab, click Unpivot Columns, and choose Unpivot Only Selected Columns
Outcome: Power Query will create two new columns: Attribute (Nutrition) and Value
Goal: Pivot (Long → Wide)
Process:
- Pick columns to turn into headers (
Attribute) - Go to the Transform tab and choose Pivot Column
- Select the columns to fill in (
Value)
Outcome: Power Query will rearrange the table into wide format, creating separate columns.
7.5 Save final nutrition table as query: Food_Nutrition
Process:In the left pane, right-click the query name → Rename → Food_Nutrition_Final
8 Data Transformation
8.1 Calculate Derived Metrics (USDA Atlas)
Calculate PercentLowAccess (% of tract population >1 mile from supermarket)
Definition: What percentage of the tract’s total population lives more than 1 mile from a supermarket?
Process: Add Column > Custom Column > enter your calculation > Pct_LowAccess
Example: * Census Tract A has POP2010 = 5,000 people total * lapop1 = 1,500 people live >1 mile from supermarket * PercentLowAccess = (1,500 / 5,000) × 100 = 30% * Interpretation: 30% of this tract’s population has low access to food
Pct_LowIncomeLowAccess (% of low-income population with low access):
Definition: Among the low-income population in this tract, what percentage also has low food access?
Example: * Census Tract B has Tract_LowIncome = 2,000 low-income people * lowIncome_LowAccess = 800 low-income people with low access * Pct_LowIncomeLowAccess = (800 / 2,000) × 100 = 40% * Interpretation: 40% of low-income residents have low food access
Why These Calculations Matter Without percentages (raw counts only):
- Tract 1: 1,000 people with low access (sounds bad)
- Tract 2: 500 people with low access (sounds better)
BUT if we look at percentages
- Tract 1: 1,000 out of 10,000 = 10% low access (not too bad)
- Tract 2: 500 out of 1,000 = 50% low access (much worse!)
Percentages allow fair comparisons between large urban tracts and small rural tracts.
8.1.1 Challenge
Here we found out two error that shouldn’t existed: 1. The Pct_LowIncomeLowAccess value over 100, which means LowIncome_LowAccess bigger than Tract_LowIncome 2. NULL pop out in some rows, because we only handle the blank value at the previous step.
8.1.2 Solution
filter> Number Filters > Keep Less Than or Equal to 100 > Also delete the rows that contains Null or N/A
8.2 Categorize Nutrition (Food Nutrition) Group By
8.2.1 Group by Category and Calculate Avearge Nutritional Values
8.2.2 Categorize Nutritional Values
Process: Add Column > Conditional Column
8.2.2.1 SugarTotal
High sugar intake is linked to health issues; categorizing helps identify high-sugar vs low-sugar foods.
Thresholds:
- High: > 15 grams (>25% of 50g daily max)
- Medium: 5-15 grams
- Low: < 5 grams
8.2.2.2 Protein
Helps identify protein-rich foods vs lower-protein options.
Thresholds (per 100g serving):
- High: > 15 grams (excellent protein source)
- Medium: 5-15 grams
- Low: < 5 grams
8.2.2.3 TotalFat
Important for understanding nutritional quality and health implications.
Thresholds (per 100g serving):
- High: > 20 grams
- Medium: 5-20 grams
- Low: < 5 grams
8.2.3 Create Final Food_Nutrition dataset
We Remove Protein, SugarTotal, and TotalFat columns, keep only their categorical levels.
Finalize Food_Nutrition_Final
9 Aggregate Census Tracts to County Level
Goal: Aggregate census tract data up to the county level(Group By)
9.1 Preparation for Weighted Average
Before grouping, we must create the weighting numerator.
Process:add a Custom Column > Poverty_Weighted_Num
9.2 Group By County
Process: Transform > Group By > alculate county-level totals and averages
Group by: County and State.
- New Column 1:
TotalPopulation, Operation: Sum, Column: Pop2010. - New Column 2:
TotalLowAccessPop, Operation: Sum, Column: LowAccess_Pop - New Column 3:
TotalLowIncomePop, Operation: Sum, Column: Tract_LowIncome - New Column 4:
TotalLowIncomeLowAccessPop, Operation: Sum, Column: LowIncome_LowAccess - New Column 5:
Sum_Poverty_Weighted, Operation: Sum, Column: Poverty_Weighted_Num - New Column 6:
TotalBlackLowAccess, Operation: Sum, Column: Black_LowAccess - New Column 7:
TotalHispanicLowAccess, Operation: Sum, Column: Hispanic_LowAccess - New Column 8:
CountUrbanTracts, Operation: Sum, Column: Urban - New Column 9:
TotalTracts, Operation: Count, Column: Urban
9.3 Add Custom Columns for key percentages (Pct_LowAccess, Pct_LowIncome_LowAccess, Pct_Urban)
9.3.1 Calculating Weighted Averages
Goal: Finish the weighted average calculation on the aggregated data .
Change the data type of this new column to Decimal Number.
9.3.2 Add Custom Columns
Process: Add Column > Custom Column
Add three new columns, we calculated three key percentage metrics:
- Pct_LowAccess: The percentage of the county population living more than 1 mile from a supermarket
- Pct_LowIncome_LowAccess: The percentage of low-income residents who also face low food access
- Pct_Urban: The percentage of census tracts in the county classified as urban
## Save as query: TX_FoodAccess_County_Agg
Process: Rename query → TX_FoodAccess_County_Agg
Outcome
A dataset where all counties can be directly compared regardless of their population size, making patterns in food access and poverty more visible.
10 Data Merging
Process: Home > Merge Queries > Merge Queries As New
inner merge on County and State columns, ensuring that only counties present in both datasets were included.
This merge operation links food access metrics (like percentage of population with low access) with demographic factors (like median income and education levels).
Outcome
A unified dataset containing both food access metrics and demographic information for 254 Texas counties.
10.1 Select Only Essential Columns for Final Dataset
We carefully selected columns that directly address our research questions about food access disparities:
- Geographic identifiers: County, State
- Population: TotalPopulation, PopDensity
- Food Access: Pct_LowAccess, Pct_LowIncome_LowAccess, Pct_Urban
- Economic: AvgPovertyRate, MedianIncome
- Education: BachelorsRate
- Demographics: Pct_Hispanic, Pct_Black
11 Export Final Clean Datasets
11.1 Load Food_Nutrition_Final to worksheet and export as Food_Nutrition_Final.csv
11.2 Load Texas_County_Merged to worksheet and export as Texas_County_Merged.csv
12 Analysis with Visualizations
12.1 Examining the Poverty-Food Access Relationship
12.2 What This Chart Represents
This scatter plot visualizes the relationship between economic poverty and physical food access across Texas counties.
- X-Axis (Average Poverty Rate): This measures the weighted average poverty rate for each county.
- Y-Axis (% Low-Income Population with Low Access): This measures the percentage of low-income residents who live far from a supermarket (more than 1 mile in urban areas or 10 miles in rural areas).
- Each Dot: Represents one individual county in Texas (out of ~254 counties).
12.3 Key Findings from the Data
As noted in this report’s analysis section, this chart reveals several counter-intuitive insights:
No Strong Link Between Poverty and Access: The red dashed trend line is nearly flat, showing a very weak negative correlation (-0.073). This means that knowing a county’s poverty rate does not help you predict how difficult it is for low-income residents to access food there.
High Variability: You can see a massive spread in the orange dots. For counties with a 15% poverty rate, the percentage of low-income people with low food access ranges wildly from 20% to over 100%. This suggests that local geography (urban vs. rural) likely plays a bigger role than poverty rates alone.
Even “Rich” Counties Have Access Issues: Look at the far left of the chart (low poverty rates <15%). There are still many dots high up on the Y-axis (90–100%), indicating that even in relatively wealthy counties, low-income populations often face significant barriers to accessing supermarkets.
12.4 Summary
This chart disproves the assumption that food access problems are strictly tied to the overall poverty level of a county. Instead, it shows that “food deserts” for low-income people exist across the entire economic spectrum in Texas.
13 Challenge and How We Approached
Working through this three-dataset food accessibility project in Excel presented a steady stream of challenges, moving from initial data loading through complex Power Query transformations. Though the individual steps utilized standard interface tools, the cumulative effect of wrangling three large, messy datasets taught us that data wrangling is fundamentally about systematic pipeline construction, not just manual cell editing.
Our first major challenge hit us during the initial data loading step. The USDA Food Access Research Atlas arrived with over 72,000 rows and 147 columns. Loading this directly into a standard spreadsheet made navigation sluggish and overwhelming. We couldn’t immediately visually scan which columns mattered for our Texas-focused analysis. Instead of dumping all raw data into sheets, we were forced to adopt a “Connection Only” strategy. We loaded the data into Power Query without materializing it on the grid, allowing us to examine the headers and apply a disciplined subsetting approach inside the editor. By selecting only the 10–15 core variables relevant to food deserts and low-income populations before the data ever hit the worksheet, we achieved an 86.8% dimension reduction, transforming 228 total columns into just 30 managed variables across the three datasets.
Once we had manageable connections, we encountered the obstacle of inconsistent missing value representations. The County Demographics dataset used -1 as a missing value placeholder, while the USDA Atlas mixed genuine zeros with missing data. We realized we couldn’t simply rely on Power Query’s automated “Remove Empty” functions blindly. Instead, we adopted a hybrid approach: we used Excel’s standard filtering tools to visually identify and delete the -1 outliers in the demographics CSV before import, but relied on Power Query’s conditional logic to handle nulls in the nutrition data. This taught us that cleaning isn’t just about clicking a button; it requires understanding whether a “0” means “no data” or “zero population.”
The census tract-to-county mapping challenge emerged during our aggregation phase. The Food Access Atlas operates at the census tract level, while our demographics data was at the county level. We initially worried about losing granularity by aggregating tracts. The technical challenge here was configuring the “Group By” settings in Power Query correctly. We had to ensure we didn’t just count rows, but specifically summed population counts while carefully calculating weighted averages for poverty rates. We solved this by verifying our row counts before and after the “Group By” step, ensuring the final output matched the 254 counties expected in Texas.
Perhaps our most frustrating challenge involved inconsistent naming conventions across datasets. County names appeared with and without “County” suffixes, and state representations alternated between “TX” and “Texas”. Early on, we considered manually finding and replacing these in the Excel grid, which would have been error-prone and non-reproducible. The breakthrough came when we committed to using Power Query’s “Replace Values” and “Format > Capitalize Each Word” transformation steps. By building these steps into the query itself, we ensured that if the data were reloaded, the “TX” to “Texas” standardization would happen automatically, ensuring perfect keys for our merges.
Our final major challenge involved determining the correct merge configuration. With three datasets from different sources, we had to map out our keys carefully. We settled on a multi-column merge strategy, linking Food Access and County Demographics using both “County” and “State” columns simultaneously to prevent mismatching counties with identical names in other states. The trickiest part was ensuring we used an “Inner Join” type. This setting acted as our final filter, ensuring that only records existing in both cleaned datasets (specifically our Texas subset) made it to the final table, automatically dropping any residual non-Texas data.
Looking back, we are particularly proud of an aspect of this project.
- The “Query-First” Strategy: By doing all our filtering, cleaning, and merging inside Power Query’s “Connection Only” environment, we kept our Excel workbook lightweight and clean. Rather than having messy tabs full of raw calculations, we produced a final workbook with just the single, clean, analysis-ready table.
Throughout this project, we learned that data wrangling in Excel isn’t about manually manipulating cells. It is about building a reproducible pipeline. The experience of configuring “Group By” operations and multi-column merges taught us that success comes from a verification-oriented mindset—checking the “Preview” pane at every step to ensure the transformation did exactly what we intended.
14 Description of tool learning
We learned Power Query as a powerful ETL (Extract, Transform, Load) tool for handling data without writing code. We started by exploring the “Get Data” ribbon to import massive CSV files. A significant lesson was learning how to handle datasets that exceed Excel’s row limit. We searched for solutions and discovered how to load data as a “Connection Only” rather than loading it directly into the worksheet. Furthermore, We learned how to transform data structures by exploring the “Transform” tab, specifically finding and applying the Unpivot function to normalize the food nutrition dataset. This tool proved handy for initial data inspection and structural cleaning before detailed analysis.